fix(vcs): prevent crash when repo has thousands of untracked files#33927
Open
youtsuhodev wants to merge 4 commits into
Open
fix(vcs): prevent crash when repo has thousands of untracked files#33927youtsuhodev wants to merge 4 commits into
youtsuhodev wants to merge 4 commits into
Conversation
Problem ------- When a Git repository contains a large number of untracked files (e.g. 1200+), the VCS layer would crash the application. This typically occurs in two scenarios: 1. A freshly initialized project (git init) with no initial commit and no .gitignore — every single file in the working tree is untracked. 2. A cloned project whose .gitignore is missing or incomplete, causing build artifacts (node_modules, dist/, etc.) to appear as untracked. Root cause ---------- Two distinct issues compounded to produce the crash: Issue A — --untracked-files=all enumerates every file individually ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Git.status() ran git status --untracked-files=all, which instructs git to recurse into every untracked directory and list each file individually in the porcelain output. For a repo with a single untracked directory containing 1200 files, this produces 1200 separate ?? path/to/file entries. The downstream code in Vcs.status() and Vcs.diff() then iterates over every entry and, for each untracked file, spawns one or more additional git subprocesses (git diff --no-index --numstat for stats, git diff --no-index --patch for the diff). With 1200+ files, this creates thousands of sequential git subprocesses, overwhelming the system and causing the application to crash or hang indefinitely. Note that --untracked-files=all is actually *not* git's default behaviour. Since Git 1.7.0 the default has been --untracked-files=normal, which collapses entire untracked directories into a single ?? dir/ entry instead of expanding them into individual file entries. Issue B — no upper bound on file count ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Even with --untracked-files=normal, a repository with thousands of individually-modified tracked files could still reach the same crash path. There was no safety limit anywhere in the pipeline: the full list was always processed unconditionally. Fix --- Two changes, one for each root cause: 1. Switch from --untracked-files=all to --untracked-files=normal (packages/opencode/src/git/index.ts) This makes git's status output collapse untracked directories into a single entry. For example, if ode_modules/ is untracked and contains 1200 files, the output will contain one line (?? node_modules/) instead of 1200 individual file entries. Directory entries (those ending with '/') are then filtered out by the existing flatMap callback, since they cannot be processed as individual files — you cannot compute a diff or stats for a directory as a whole. In normal day-to-day development, where untracked files appear inside tracked directories (e.g. src/new-component.ts), --untracked-files=normal behaves identically to --untracked-files=all — it still lists each file individually. The difference only manifests for *entirely* untracked directories, which is exactly where the crash occurs. 2. Add MAX_STATUS_FILES safety limit (packages/opencode/src/project/vcs.ts) A new constant MAX_STATUS_FILES = 2000 caps the number of items processed by both Vcs.status() and the internal iles() helper (used by Vcs.diff()). If the list exceeds this threshold, it is silently truncated to the first 2000 entries. This is a defence-in-depth measure: even if git.status() were to return more items than expected, the VCS layer will never attempt to spawn subprocesses for all of them. Impact ------ - The application no longer crashes on open when the working tree contains many untracked files. - The review/change panel shows partial results when the number of changed files exceeds MAX_STATUS_FILES, which is vastly preferable to a hard crash that forces the user to delete all app data. - The fix is entirely in the backend VCS pipeline; no frontend changes are required.
Contributor
|
Thanks for your contribution! This PR doesn't have a linked issue. All PRs must reference an existing issue. Please:
See CONTRIBUTING.md for details. |
Contributor
|
Thanks for updating your PR! It now meets our contributing guidelines. 👍 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue for this PR
Closes #33928
Type of change
What does this PR do?
When a Git repository contains a large number of untracked files (e.g. 1200+), the VCS layer crashes the application. This typically occurs in two scenarios:
ode_modules, \dist/, etc.) to appear as untracked.
Two distinct issues compounded to produce the crash:
Issue A — --untracked-files=all\ enumerates every file individually
\Git.status()\ ran \git status --untracked-files=all, which instructs git to recurse into every untracked directory and list each file individually in the porcelain output. For a repo with a single untracked directory containing 1200 files, this produces 1200 separate ?? path/to/file\ entries.
The downstream code in \Vcs.status()\ and \Vcs.diff()\ then iterates over every entry and, for each untracked file, spawns one or more additional git subprocesses (\git diff --no-index --numstat\ for stats, \git diff --no-index --patch\ for the diff). With 1200+ files, this creates thousands of sequential git subprocesses, overwhelming the system and causing the application to crash or hang indefinitely.
Note that --untracked-files=all\ is actually not git's default behaviour. Since Git 1.7.0 the default has been --untracked-files=normal, which collapses entire untracked directories into a single ?? dir/\ entry instead of expanding them into individual file entries.
Issue B — no upper bound on file count
Even with --untracked-files=normal, a repository with thousands of individually-modified tracked files could still reach the same crash path. There was no safety limit anywhere in the pipeline: the full list was always processed unconditionally.
Fix
Two changes, one for each root cause:
*Switch from --untracked-files=all\ to --untracked-files=normal* (\packages/opencode/src/git/index.ts)
This makes git's status output collapse untracked directories into a single entry. For example, if
ode_modules/\ is untracked and contains 1200 files, the output will contain one line (?? node_modules/) instead of 1200 individual file entries.
Directory entries (those ending with /) are then filtered out by the existing \latMap\ callback, since they cannot be processed as individual files — you cannot compute a diff or stats for a directory as a whole.
In normal day-to-day development, where untracked files appear inside tracked directories (e.g. \src/new-component.ts), --untracked-files=normal\ behaves identically to --untracked-files=all\ — it still lists each file individually. The difference only manifests for entirely untracked directories, which is exactly where the crash occurs.
Add \MAX_STATUS_FILES\ safety limit (\packages/opencode/src/project/vcs.ts)
A new constant \MAX_STATUS_FILES = 2000\ caps the number of items processed by both \Vcs.status()\ and the internal \iles()\ helper (used by \Vcs.diff()). If the list exceeds this threshold, it is silently truncated to the first 2000 entries.
This is a defence-in-depth measure: even if \git.status()\ were to return more items than expected, the VCS layer will never attempt to spawn subprocesses for all of them.
Impact
How did you verify your code works?
Screenshots / recordings
N/A — backend change only.
Checklist